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ABSTRACT 



This study compared the performance of English language 
learners and native speakers of English on mathematics word problems from the 
National Assessment of Educational Progress (NAEP) tests and investigated 
whether modifying the linguistic structures in the test items affected 
student test performance. The study began with the analysis of existing NAEP 
data. These analyses strongly suggested that students’ language background 
impacts their performance. Two separate field studies were then conducted. 

For the first, a study of student perceptions, 36 eighth graders from the 
greater Los Angeles area were interviewed. These students were administered 
the original mathematics items and parallel revised items (with simplified 
language) in a structured interview format to investigate their perceptions 
and preferences. In the second field study, the Accuracy Test Study, 1,174 
eighth graders took paper-and-pencil mathematics tests with 10 original NAEP 
items, 10 items with linguistic modifications, and 5 noncomplex control 
items. Overall, results show that students in the English as a Second 
Language categories, especially those in the lower levels, had considerably 
lower mathematics performance than other students. Revising the items to make 
them less linguistically complex helped some students, although such 
improvement was not statistically significant. (Contains 44 references.) 
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Confounding of students' performance and their language background 

variables 

Jamal Abedi 

Introduction 

Literature has drawn attention to the importance of language in student 
performance on assessments in content-based areas such as mathematics (see, for 
example, Abedi, Lord, and Plummer, 1995; Abedi, Lord, and Hofstetter, 1998; Aiken, 
1971; Aiken, 1972; Cocking and Chipman, 1988; De Corte, Verschaffel, and DeWin, 
1985; Jerman and Rees, 1972; Kintsch and Greeno, 1985; Larsen, Parker, and 
Trenholme, 1978; Lepik, 1990; Mestre, 1988; Munro, 1979; Noonan, 1990; Orr, 1987; 
Rothman and Cohen, 1989; Spanos, Rhodes, Dale, and Crandall, 1988). Nationally, 
children perform 10% to 30% worse on arithmetic word problems than on 
comparable problems presented in numeric format (Carpenter, Corbitt, Kepner, 
Linquist, & Reys, 1980). The discrepancy between performance on verbal and 
numeric format problems strongly suggests that factors other than mathematical 
skill contribute to success in solving word problems (August & Hakuta, 1997; 
Cummins, Kintsch, Reusser, & Weimer, 1988; LaCelle-Peterson & Rivera, 1994; 
Zehler, Hopstock, Fleischman, & Greniuk, 1994). 

English language learner (ELL) students score lower than students who are 
proficient in English on standardized tests of mathematics achievement in 
elementary school, as well as on the SAT and the quantitative and analytical sections 
of the Graduate Record Examination. Although there is no evidence to suggest that 
the basic abilities of ELL students are different from non-ELL students, the 
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achievement differences between ELL and non-ELL students are pronounced 
(Cocking & Chipman, 1988; Mestre, 1988). 

This study compared the performance of English language learners and 
native speakers of English on math word problems from NAEP (National 
Assessment of Educational Progress) tests and investigated whether modifying the 
linguistic structures in the test items affected student test performance. 

We started with the analyses of existing NAEP data. The results of our 
analyses of NAEP data strongly suggested that students' language background 
impact their performance. To examine this issue, two separate field studies were 
conducted. For the first field study, the Student Perceptions Study, 36 8th-grade 
students were interviewed. These students were given the original math items and 
parallel revised items (with simplified language) in a structured interview format to 
investigate the students' perceptions and preferences. 

In the second field study, the Accuracy Test Study, 1,174 8th-grade students 
took paper-and-pencil math tests including ten original NAEP math items, ten items 
with linguistic modifications, and 5 noncomplex control items. Students' scores on 
the original and linguistically modified items were compared. 

This study was conducted in two separate phases: (1) analyses of extant data, and 
(2) field research. 

Phase 1 

In Phase 1 of the study, we examined the NAEP data from the 1990 and 1992 
main assessments. Items from the 8th-grade NAEP math tests and questionnaire 
items were analyzed using a linguistic categorization scheme. A multiple 
discriminant analysis was applied to composite scores to examine the effects of 
language background variables. In this multiple discriminant analysis, language 
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background variables were used as grouping variables and composite test scores 
were used as discriminating variables. The results clearly revealed lower math 
proficiency scores for the subjects who predominantly spoke a language other than 
English in the home. This relationship was more evident for longer items, items that 
appear to have higher language load. 

Next, the effect of linguistic complexity on students' performance on NAEP 
math items was analyzed by creating item parcels based on linguistic complexity, 
using pragmatic criteria including difficulty of vocabulary, abstract or culture- 
specific content, and number of complex structures in a sentence. A repeated 
measures design was applied to the parcel scores. The results of the analyses 
conducted on the language background variables showed a highly significant 
difference between the scores of the two parcels. Students who spoke more of a 
language other than English in the home performed significantly lower than 
students who spoke only English in the home, and the difference was greater for the 
linguistically complex items (Fi,ll70 = 56.42, p < .01). 

Lastly, we examined the proportions of omitted or not-reached items by 
students' language background. Groups were formed based upon whether the 
student reported speaking a language other than English in the home "always," 
"sometimes," or "never." The groups were then compared on omitted /not-reached 
items. In nearly all cases, the students who always spoke a language other than 
English in the home had much higher percentages of omitted /not-reached items 
than the students who spoke only English in the home. 

Phase 2 

In Phase 2 of the study, we examined the role of linguistic complexity in 
students performance on NAEP math items. Based on the literature and expert 




5 



Language Background 



4 



knowledge, we identified linguistically complex NAEP items. The set of linguistic 
features employed for this phase of the study was limited to features actually 
occurring in the small corpus of released NAEP math items available to us. The 
features chosen included familiarity /frequency of non-math vocabulary, length of 
nominals (noun phrases), voice of verb phrase, conditional clauses, question 
phrases, and abstract or impersonal presentations. We then prepared modified 
versions of these linguistically complex items so that the revised items contained 
simpler language but retained their original math content. The linguistically 
complex items and their revised counterparts were administered to a group of 
mostly 8th-grade students in the greater Los Angeles area to find out, in fact, if 
linguistic complexity had any impact on students' math performance. The study's 
item pool was limited to a subset of the 1992 released math items. 

Since the language modification of test items was a major part of this study, 
we include a brief description of this process. 

Modification of Math Items 

The corpus of math items used for this investigation was the 69 released items 
from the 1992 NAEP main math assessment. From the set of linguistic features 
appearing in these items, several features were identified as potentially problematic 
for ELL students. Judgments were based on expert knowledge and on findings of 
previous empirical studies (including, among others, Adams, 1990; Bever, 1970; 

Biber, 1988; Botel and Granowsky, 1974; Bormuth, 1966; Celce-Murcia and Larsen- 
Freeman, 1983; Gathercole and Baddeley, 1993; Chall, Jacobs, and Baldwin, 1990; 
Forster and Olbrei, 1973; Hunt, 1965, 1977; Jones, 1982; Just and Carpenter, 1980; 
Kane, 1968, 1970; Klare, 1974; Lemke, 1986; MacDonald, 1993; MacGinitie and 
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Tretiak, 1971; Paul, Nibbelink, and Hoover, 1986; Pauley and Syder, 1983; Perera, 
1980; Slobin, 1968; and Wang, 1970). 

For those items with language which might be difficult for students, simpler 
versions were drafted, keeping the math task the same, but modifying non-math 
vocabulary and linguistic structures; math terminology was not changed. (Math 
experts checked original and modified versions to ensure that the math content was 
parallel.) Problematic features were removed or recast. For a given math item, 
more than one feature might be revised. Linguistic features that were modified 
included the following (see Abedi, Lord, and Plummer, 1995, for further discussion): 

• familiarity/ frequency of non-math vocabulary: unfamiliar or infrequent 
words were changed (census > video game). 

• voice of verb phrase: passive verb forms were changed to active ( two 
comparisons were made > she made two comparisons). 

• length of nominals: long nominals were shortened (last year's class vice 
president > vice president). 

• conditional clauses: conditionals were replaced with separate sentences, or 
the order of conditional and main clause was changed (If Lee delivers x 
newspapers, > Lee delivers x newspapers.). 

• relative clauses: removed or recast (A report that contains 64 sheets of paper > 
He needs 64 sheets of paper for each report). 

• question phrases: complex question phrases were changed to simple 
question words (At which of the following times > When). 

• abstract or impersonal presentations: made more concrete (The weights of 3 
objects were compared > Sandra compared the weights of 3 objects). 
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Student Perceptions Study 

Three separate studies were conducted in Phase 2 of the project and are 
reported separately. The first study, which will be referred to as the Student 
Perceptions Study, consisted of interviews with a group of 38 8th-grade students in 
the greater Los Angeles area, including native and non-native speakers of English 
with a range of math skill levels. The purpose of the interviews was to investigate 
the hypothesis that linguistically simplified items are, in fact, perceived as easier to 
understand by students. The students were presented the original (linguistically 
complex) math items and their revised (less linguistically complex) counterpart 
items in a structured interview format. Subjects consistently reported a strong 
preference for the revised items over the original items. Student preference for the 
revised items seemed to support the notion that the math items could be 
linguistically simplified in meaningful ways for the test taker. The interview results 
supported our plan to test a larger group of students to determine whether the 
observed differences in student responses to the language of the math items would 
translate into actual differences on math test scores. 

Accuracy Study 

The second study in Phase 2 will be referred to as the Accuracy Study. In this 
study, 39 8th-grade classes (1,031 students) were selected, with oversampling of 
Limited English Proficiency (LEP) students. Released items from the 1992 main 
assessment were then re-examined for linguistic complexity based on the 
information obtained from the Student Perceptions Study. From these released 
items, 20 were identified as linguistically complex and were then modified to reduce 
linguistic complexity. The two sets of items (20 original and 20 revised) were placed 
into two booklets (Form A and Form B) along with 5 linguistically non-complex 
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items. In addition to the 25 math items, each booklet contained a 12-item language 
background questionnaire that was specifically designed for the substudy. Also, 
information on students' math background, ESL program participation, and 
socioeconomic status (SES) (as measured by participation in a free lunch program) 
was collected from schools. 

In the data from the Accuracy Study, students math performances on the 
original and revised items were compared. In general, the results of this study were 
consistent with the literature and indicated that (a) students backgrounds in math 
(as indicated by the level of math class) had a significant impact on students math 
scores in this study; (b) students in ESL programs had lower scores in math than 
non-ESL students; (c) males and females performed at about the same level; and (d) 
there were some differences in students math performance with respect to ethnicity. 

No analyses performed on revised versus original items yielded statistically 
significant results, except for those linked to math class level. However, certain 
trends were observed. As these trends suggested interesting possibilities, we 
investigated them in detail. We computed percent of improvement of students math 
performance as a result of the revision of math items. For each level of math class, 
percentage of improvement was computed by subtracting the mean of original item 
scores from the mean of the revised item scores for the same set of items and then 
dividing the difference between the two means by the mean of original item scores. 
The revision of items had differential impact on students math performances. 
Students in low- and average-level math classes exhibited the greatest improvement. 
The trend decreased over the intermediate to high categories, and for the highest 
level math classes (high math, honors, and algebra) there was no improvement. 
Students in different levels of math classes benefited differently from the revisions. 
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Because of the initially mixed results from the Accuracy Study, it was decided 
to perform analyses using HLM procedure. We created two models. In Model 1, we 
used the composite scores of the 10 original items in booklet A as the outcome 
variable; students membership in native/not-native English speaker groups and 
students' participation in free lunch program were used as subject-level data; and 
type of math class and an aggregate of free lunch program were used as level-2 
variables in our HLM model. For Model 2, we used the same variables as level-1 
and level-2 variables with 10 revised items in booklet B (sister items of the 10 
original items in booklet A). A comparison between the two models revealed 
changes /improvements due to revision of items. 

Speed Study 

Based on results from the Accuracy Study, we examined the effect of 
linguistic modifications on the time a student required to answer /complete the math 
test items. Two more booklets were developed for this third study. The 20 original 
items were placed in booklet A and the 20 revised items were placed in booklet B. 
The five non-complex items were eliminated from these booklets. The same 
language background questionnaire that was included in the booklets for the 
Accuracy Study was included in these booklets. One-hundred and forty-three 8th- 
grade students in the greater Los Angeles area w;ere selected (mostly ESL math 
students) because it seemed that those students would benefit more from 
linguistically simplified versions of items. However, some students from high-level 
math classes and algebra classes were also included. Of the 143 students who 
participated in this study, 76 students answered the original items (booklet A) and 
67 answered the revised items (booklet B). Students were given ten minutes to 
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answer the 20 math questions. (In contrast, in the Accuracy Study, the majority of 
students were given enough time to answer all 25 questions.) 

Native speakers (M = 4.76, SD = 2.75) performed slightly higher than non- 
native speakers (M = 3.65, SD = 2.25) on the speed test but the difference was not 
significant. However, there were large differences between performances of 
students in different ESL, math class, and school lunch program categories. We 
could not apply analysis of variance in many cases because of extremely 
unproportional cell frequencies. For those analyses with appropriate frequencies, 
students in different math classes performed differently. For the "low" math class 
category, the mean score was 3.68 (SD = 2.48) and for the "high" math class, the 
mean was 5.18 (SD = 2.56). Analyses of variance revealed no significant differences 
between the subgroups of type of math class (F2,64 = 1-76, p = .18). School lunch 
program participation also seemed to have some impact on students' performance 
on the revised items. A range of differing degrees of participation in such programs 
was reported. The greatest degree of involvement was labeled "AFDC", and no 
involvement in such programs was labeled "no lunch code." For categories on this 
variable, means ranged from 3.14 (SD = 1.96) for "no lunch code" to 5.75 (SD = .500) 
for "AFDC." However, ANOVA results yielded no significant results in this case 
(F 2 ,59 = 1-03, p = .36). 

Results 

The analyses of NAEP data indicated some effects of students language 
backgrounds on their math performance in junior high school. When items were 
categorized by their length, students who spoke a language other than English at 
home performed significantly lower than students who always spoke English at 
home; the difference was more pronounced on long items. Analysis also showed 
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that the rates of omitted/not-reached math items for non-native English speakers 
were higher than those for native speakers. These results clearly indicate that 
confounding of language and performance occurs on NAEP math items. 

Original and linguistically simplified items were administered in the 
Accuracy Study and the Speed Study. No statistically significant results were found 
overall, but students in low and average math classes scored higher on the 
simplified versions, consistent with similar findings in previous studies. 

A number of problems emerged during the study, including limited access to 
the NAEP item pool, an unequal distribution of items across the NAEP content area 
subscales, and a lack of reliable measures of English proficiency. It was also 
observed that classes that were supposedly linguistically homogeneous were not 
necessarily so; although NAEP policy is to avoid testing ESL students, NAEP 
administrations are in fact testing students whose ability in English may be weak. 

In addition to analyzing and discussing the Language Background 
Questionnaire items independently, we also used these items along with the 
background data gathered from schools in analyzing students' math performances. 

The results of analyses on the language background questions were consistent 
across the two field studies. Following is a summary of some of the findings from 
the language background questions: 

1. Non-native English speakers tend to use their native language more with 
their parents and grandparents than with their siblings and friends. 

2. Beginning ESL students showed more signs of concern in the area of 
understanding, speaking, reading and writing English. 
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3. All students' reported that they have more problems understanding 
teachers explanations, textbooks, and the texts of tests in the area of math 
than in the areas of science or social studies. 

4. Native English speakers self-reported a higher level of proficiency in 
English than non-native speakers. 

5. Males and females reported about the same level of proficiency in English 
and the "other language." 

6. The most apparent differences between groups of students was across the 
categories of ESL class placement codes; differences were found on their 
self-reported level of English proficiency (understanding, speaking, 
reading, and writing) and on their understanding their teacher's 
explanation, textbook, and text of their exams. 

"Beginning ESL" students in most cases reported a considerably lower level of 
English proficiency. However, the number of students in this category was so small 
in many instances that no valid interpretation was possible. 

The most salient results of our analyses were significant differences in 
students' performances across categories of type of math class. When variability 
due to the type of math class was controlled, there was very little variability left to 
warrant further attention. 

For the speed section of the study, there were higher rates of response on the 
revised items. These improvements were more evident with the language minority 
students. Unfortunately, the small number of students in this part of study did not 
allow us to do any in-depth analyses. 

Conclusion 
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The results of our analyses on the original, revised, and total scores in general 
indicated that students in the ESL categories, particularly in the lower levels, show 
considerably lower math performance than other students. This is a great sign for 
concern and it requires special attention. There do not seem to be major differences 
between these ESL low-performance groups of students and other groups of 
students based on SES or other variables which could explain such differences. 
Therefore, one must conclude that language is a very important element in such 
cases. That is, language and performance are confounded. The exact nature of the 
confounding factors remains elusive. 

The results of our analyses also suggested that revising math items to make 
them less linguistically complex helped some students, particularly those in low- 
and average-level math classes; since previous studies have shown math and 
reading proficiency to be correlated, it is likely that the reading and language skills 
of many of these students were also at the low or average level. In order to do math 
word problems, students must learn the special vocabulary and structures peculiar 
to the math word problem genre. In addition, general proficiency in language is 
necessary if the student is to learn from teachers and books in the mathematics 
classroom. General proficiency in language is also necessary for a true assessment 
of the student's knowledge in NAEP mathematics tests. Solving math word 
problems presents an additional challenge for the student whose language 
proficiency is limited, and the added cognitive load can impact individual 
performances negatively. Thus, the language of math items may disproportionately 
impact the scores of less language-proficient students, whether they are native 
speakers or non-native speakers. Other approaches emphasizing more 
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representational rendition of content might facilitate performance of students with 
lower proficiencies in English. 

Summary 

To summarize, the study clearly shows that ESL students are at a significant 
disadvantage in mathematics content area assessments. We found that there was a 
small overall improvement in math scores on the revised versions of the NAEP math 
items, although such improvement was unimpressive. The lack of statistically 
significant improvement was due, we feel, to a number of limitations, including the 
small size of the item pool available. It remains prudent to continue searching for 
interactions among linguistic and socioeconomic background variables that will 
shed light upon the increasingly important issue of the role of language in content 
area assessment. 
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